Global discretization of continuous attributes as preprocessing for machine learning

نویسندگان

  • Michal R. Chmielewski
  • Jerzy W. Grzymala-Busse
چکیده

Real-life data usually are presented in databases by real numbers. On the other hand, most inductive learning methods require a small number of attribute values. Thus it is necessary to convert input data sets with continuous attributes into input data sets with discrete attributes. Methods of discretization restricted to single continuous attributes will be called local, while methods that simultaneously convert all continuous attributes will be called global. In this paper, a method of transforming any local discretization method into a global one is presented. A global discretization method, based on cluster analysis, is presented and compared experimentalty with three known local methods, transformed into global. Experiments include tenfold cross-validation and leaving-oneout methods for ten real-life data sets. © 1996 Elsevier Science Inc. K E Y W O R D S : discretization, quantization, continuous attributes, machine learning from examples, rough set theory

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discretization of Continuous Attributes in Supervised Learning algorithms

We propose a new algorithm, called CILA, for discretization of continuous attribute. The CILA algorithm can be used with any class labeled data. The tests performed using the CILA algorithm show that it generates discretization schemes with almost always the highest dependence between the class labels and the discrete intervals, and always with significantly lower number of intervals, when comp...

متن کامل

MIDCA --- A Discretization Model for Data Preprocessing in Data Mining

Decision tree is one of the most widely used and practical methods in data mining and machine learning discipline. However, many discretization algorithms developed in this field focus on univariate only, which is inadequate to handle the critical problems especially owned by medical domain. In this paper, we propose a new multivariate discretization method called Multivariate Interdependent Di...

متن کامل

Unsupervised Discretization Using Kernel Density Estimation

Discretization, defined as a set of cuts over domains of attributes, represents an important preprocessing task for numeric data analysis. Some Machine Learning algorithms require a discrete feature space but in real-world applications continuous attributes must be handled. To deal with this problem many supervised discretization methods have been proposed but little has been done to synthesize...

متن کامل

Making Better Use of Global Discretization

Before applying learning algorithms to datasets, practitioners often globally discretize any numeric attributes. If the algorithm cannot handle numeric attributes directly, prior discretization is essential. Even if it can, prior discretization often accelerates induction, and may produce simpler and more accurate classi ers. As it is generally done, global discretization denies the learning al...

متن کامل

Dynamic Discretization of Continuous Attributes

Discretization of continuous attributes is an important task for certain types of machine learning algorithms. Bayesian approaches, for instance, require assumptions about data distributions. Decision Trees, on the other hand, require sorting operations to deal with continuous attributes , which largely increase learning times. This paper presents a new method of discretization, whose main char...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Int. J. Approx. Reasoning

دوره 15  شماره 

صفحات  -

تاریخ انتشار 1996